NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Amdahl’s Law for LLMs: A Throughput-Centric Analysis of Extreme LLM Quantization

Malekar, Jinendra; Zand, Ramtin (September 2025, Transactions on machine learning research)

The emergence of 1-bit large language models (LLMs) has sparked significant interest, promising substantial efficiency gains through extreme quantization. However, these benefits are inherently limited by the portion of the model that can be quantized. Specifically, 1-bit quantization typically targets only the projection layers, while the attention mechanisms remain in higher precision, potentially creating significant throughput bottlenecks. To address this, we present an adaptation of Amdahl's Law specifically tailored to the LLMs, offering a quantitative framework for understanding the throughput limits of extreme quantization. Our analysis reveals how improvements in quantization can deliver substantial throughput gains, but only to the extent that they address critical throughput-constrained sections of the model. Through extensive experiments across diverse model architectures and hardware platforms, we highlight key trade-offs and performance ceilings, providing a roadmap for future research aimed at maximizing LLM throughput through more holistic quantization strategies.
more » « less
Full Text Available
Integrated algorithm and hardware design for hybrid neuromorphic systems

https://doi.org/10.1038/s44335-025-00036-2

Seekings, James; Ardakani, Mahsa; Chandarana, Peyton; Eslami, Arshia; Mohammadi, Mohammadreza; Zand, Ramtin (August 2025, npj Unconventional Computing)

This paper investigates the combined potential of neuromorphic and edge computing to develop a flexible machine learning (ML) system designed for processing data from dynamic vision sensors. We build and train hybrid models that integrate spiking neural networks (SNNs) and artificial neural networks (ANNs) using the PyTorch and Lava frameworks. We explore the effects of quantization on ANN models to assess its impact on both accuracy and energy efficiency. Additionally, we address the challenges of deploying hybrid models on hardware by implementing individual components on specific edge platforms. We also propose an accumulator circuit to bridge the spiking and non-spiking domains. Comprehensive performance analyses are conducted on a heterogeneous system of neuromorphic and edge AI hardware, assessing accuracy, latency, and energy consumption. Our results show that hybrid spiking networks improve accuracy and energy efficiency. Moreover, we find that quantization improves hybrid networks, further reducing energy consumption while boosting accuracy.
more » « less
Full Text Available
CrossNAS: A Cross-Layer Neural Architecture Search Framework for PIM Systems

https://doi.org/10.1145/3716368

Amin, Md_Hasibul; Mohammadi, Mohammadreza; Bakos, Jason D; Zand, Ramtin (June 2025, ACM)
Peng, Lu; Vaisband, Boris; Chen, Fan; Zhou, Peipei; Kvatinsky, Shahar; Xie, Jiafeng (Ed.)
In this paper, we propose the CrossNAS framework, an automated approach for exploring a vast, multidimensional search space that spans various design abstraction layers—circuits, architecture, and systems—to optimize the deployment of machine learning workloads on analog processing-in-memory (PIM) systems. CrossNAS leverages the single-path one-shot weight-sharing strategy combined with the evolutionary search for the first time in the context of PIM system mapping and optimization. CrossNAS sets a new benchmark for PIM neural architecture search (NAS), outperforming previous methods in both accuracy and energy efficiency while maintaining comparable or shorter search times.
more » « less
Full Text Available
Accelerating 1-Bit Llms Via in-Memory Computing Architectures

https://doi.org/10.1109/MWSCAS53549.2025.11244527

Malekar, Jinendra; Zand, Ramtin (November 2025, Conference proceedings)

In this paper, we present a novel hybrid computing architecture designed to accelerate inference in 1-bit large language models (LLMs). Our approach combines the strengths of analog in-memory computing (IMC) and digital systolic arrays to address the diverse precision requirements across different layers of 1-bit LLMs. Specifically, we utilize analog IMC to accelerate low-precision matrix multiplication (MatMul) operations within the projection layers, which are naturally amenable to extreme quantization. Meanwhile, digital systolic arrays are employed to efficiently handle high-precision MatMul operations in the attention heads, preserving accuracy where precision is most critical. By partitioning the computational workload based on precision needs, our hybrid architecture increases throughput and energy efficiency. Experimental evaluations demonstrate that our design delivers up to an 80x improvement in tokens processed per second and achieves a 70% increase in energy efficiency (tokens per joule) when compared to conventional digital hardware accelerators.
more » « less
Full Text Available
From Prompt to Accelerator: A Perspective on LLM-Based Analog In-Memory Accelerator Design Automation

https://doi.org/10.1145/3716368.3735276

Vungarala, Deepak; Amin, Md Hasibul; Roohi, Arman; Ghosh, Arnob; Zand, Ramtin; Angizi, Shaahin (June 2025, ACM)

Full Text Available
FedChip: Federated LLM for Artificial Intelligence Accelerator Chip Design

https://doi.org/10.1109/ICLAD65226.2025.00019

Nazzal, Mahmoud; Nguyen, Khoa; Vungarala, Deepak; Zand, Ramtin; Angizi, Shaahin; Phan, Hai; Khreishah, Abdallah (June 2025, IEEE)

Full Text Available
Magnetic In/Near-Sensor Architectures: From Raw Sensing to Smart Processing

https://doi.org/10.1145/3716368.3735267

Tabrizchi, Sepehr; Shafiee_Sarvestani, Ali; Amin, Md Hasibul; Najafi, Deniz; Angizi, Shaahin; Zand, Ramtin; Roohi, Arman (June 2025, ACM)

Full Text Available
TPU-Gen: LLM-Driven Custom Tensor Processing Unit Generator

https://doi.org/10.1109/ICLAD65226.2025.00010

Vungarala, Deepak; Elbtity, Mohammed Essa; Pandit, Kartik; Syed, Sumiya; Alam, Sakila; Ghosh, Arnob; Zand, Ramtin; Angizi, Shaahin (June 2025, IEEE)

Full Text Available
LLM-IMC: Automating Analog In-Memory Computing Architecture Generation with Large Language Models

https://doi.org/10.1109/FCCM62733.2025.00071

Vungarala, Deepak; Amin, Md Hasibul; Mercati, Pietro; Roohi, Arman; Zand, Ramtin; Angizi, Shaahin (May 2025, IEEE)

Full Text Available
From Pixels to Reasoning: A Cross-Layer Photonic Design for Edge Visual Intelligence

https://doi.org/10.1109/ISVLSI65124.2025.11130318

Najafi, Deniz; Morsali, Mehrdad; Barkam, Hamza Errahmouni; Reidy, Brendan; Tabrizchi, Sepehr; Roohi, Arman; Nikdast, Mahdi; Zand, Ramtin; Imani, Mohsen; Angizi, Shaahin (July 2025, IEEE)

Full Text Available

« Prev Next »

Search for: All records